Skip to content

Night Shift: fix full table scan in word details query#24

Open
EtanHey wants to merge 1 commit intomasterfrom
nightshift/2026-03-01-5693
Open

Night Shift: fix full table scan in word details query#24
EtanHey wants to merge 1 commit intomasterfrom
nightshift/2026-03-01-5693

Conversation

@EtanHey
Copy link
Owner

@EtanHey EtanHey commented Mar 1, 2026

User description

Automated improvement by Golems Night Shift.

fix full table scan in word details query


PR Type

Enhancement, Bug fix


Description

  • Add by_persian index to words table for O(log n) lookup

  • Fix full table scan in getWordDetailsWithSongs query

  • Refactor code formatting for consistency across wordProgress queries


Diagram Walkthrough

flowchart LR
  A["words table"] -->|"by_persian index"| B["getWordDetailsWithSongs query"]
  C["Full table scan"] -->|"replaced with"| D["Indexed lookup"]
  B -->|"O(log n) lookup"| E["Improved performance"]
Loading

File Walkthrough

Relevant files
Enhancement
schema.ts
Add by_persian index to words table                                           

convex/schema.ts

  • Added by_persian index to the words table schema
  • Index enables efficient lookup of words by their Persian text
  • Reformatted index definitions for better readability
+3/-1     
Bug fix
wordProgress.ts
Fix word details query and format code consistently           

convex/wordProgress.ts

  • Fixed getWordDetailsWithSongs query to use by_persian index instead of
    full table scan
  • Replaced ctx.db.query("words").collect() with indexed query using
    withIndex("by_persian")
  • Applied consistent code formatting across multiple query functions
  • Reformatted multi-line query chains and object literals for
    consistency
+29/-44 

Summary by CodeRabbit

  • Bug Fixes

    • Fixed handling for missing word records; new entries are now created when needed instead of failing silently.
  • Refactor

    • Optimized data retrieval performance by using targeted index queries instead of in-memory filtering.
    • Improved code organization and formatting across data access layers.

The getWordDetailsWithSongs query was loading ALL words from the database
and filtering in memory. As the words table grows, this becomes a major
performance bottleneck — called every time a user opens the word details
modal. Now uses a by_persian index for O(log n) lookup.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@vercel
Copy link

vercel bot commented Mar 1, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
songscript Building Building Preview, Comment Mar 1, 2026 2:07am

@coderabbitai
Copy link

coderabbitai bot commented Mar 1, 2026

📝 Walkthrough

Walkthrough

The changes add a new database index on the persian field of the words table and optimize word progress data retrieval operations. A secondary modification introduces explicit record creation logic in mutation operations when no existing word progress entries match the given word identifier.

Changes

Cohort / File(s) Summary
Database Schema
convex/schema.ts
Added new index by_persian on the words table to index the persian field alongside existing indexes.
Word Progress Optimization
convex/wordProgress.ts
Refactored query chains to use indexed by_persian queries instead of in-memory filtering for improved data retrieval efficiency. Extended toggleLearned and setLearned mutations to explicitly create new wordProgress records when no matching records exist for a given word, with formatting adjustments throughout.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 A persian index blooms so bright,
Queries now fly with indexed might,
When no record dares to exist,
We create it—logic won't resist!
Words progress through optimized space,
With grace and speed we quicken the pace. ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and accurately describes the main change: fixing a full table scan inefficiency in the word details query by adding an index for targeted lookups instead of in-memory filtering.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch nightshift/2026-03-01-5693

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@qodo-code-review
Copy link

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
🟢
No security concerns identified No security vulnerabilities detected by AI analysis. Human verification advised for critical code.
Ticket Compliance
🎫 No ticket provided
  • Create ticket/issue
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🟢
Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status:
No mutation audit log: The newly-modified mutations update user learning/progress state (e.g., learned toggles
and count increments) without any explicit audit logging, which may violate audit trail
requirements depending on whether this data is considered critical/sensitive in your
compliance scope.

Referred Code
export const incrementViewCount = mutation({
  args: { wordId: v.id("words"), persian: v.string() },
  handler: async (ctx, args) => {
    const userId = await requireAuth(ctx);

    // Look up by persian text first - this is the canonical key
    const existingPersian = await ctx.db
      .query("wordProgress")
      .withIndex("by_user", (q) => q.eq("userId", userId))
      .filter((q) => q.eq(q.field("persian"), args.persian))
      .first();

    if (existingPersian) {
      // Update the existing record (shared across all instances of this word)
      await ctx.db.patch(existingPersian._id, {
        viewCount: existingPersian.viewCount + 1,
        lastSeen: Date.now(),
      });
      return existingPersian._id;
    } else {
      // No record exists for this persian word - create one


 ... (clipped 126 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

@qodo-code-review
Copy link

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Improve query efficiency and completeness

To improve efficiency and correctness, refactor the logic to iterate directly
over matchingWords instead of unique songIds, and use a cache to avoid
refetching song data. This ensures all word occurrences are processed.

convex/wordProgress.ts [485-515]

-// Get unique song IDs
-const songIds = [...new Set(matchingWords.map((w) => w.songId))];
+// Get song details for each song
+const songCache = new Map();
+const songsWithContext = await Promise.all(
+  matchingWords.map(async (word) => {
+    let song = songCache.get(word.songId);
+    if (!song) {
+      song = await ctx.db.get(word.songId);
+      if (!song) return null;
+      songCache.set(word.songId, song);
+    }
 
-// Get song details for each song
-const songsWithContext = await Promise.all(
-  songIds.map(async (songId) => {
-    const song = await ctx.db.get(songId);
-    if (!song) return null;
-
-    // Find the specific word instance in this song to get line number
-    const wordInSong = matchingWords.find((w) => w.songId === songId);
-    const lineNumber = wordInSong?.lineNumber ?? 0;
+    const lineNumber = word.lineNumber;
 
     // Get the line context
     const line = await ctx.db
       .query("lyrics")
       .withIndex("by_song", (q) =>
-        q.eq("songId", songId).eq("lineNumber", lineNumber),
+        q.eq("songId", word.songId).eq("lineNumber", lineNumber),
       )
       .first();
 
     return {
-      songId,
+      songId: word.songId,
       title: song.title,
       artist: song.artist,
       sourceLanguage: song.sourceLanguage,
       lineNumber,
       linePreview: line?.original?.slice(0, 50) || "",
     };
   }),
 );

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 9

__

Why: The suggestion correctly identifies a logical bug where only the first occurrence of a word in a song is processed, and also points out a related performance issue, offering a solution that is both more correct and more efficient.

High
General
Add composite user–persian index

Add a composite index by_user_persian on ["userId", "persian"] to the
wordProgress table to improve query performance.

convex/schema.ts [57-64]

 wordProgress: defineTable({
   userId: v.id("users"),
   wordId: v.id("words"),
   persian: v.string(),
   learned: v.boolean(),
   lastSeen: v.optional(v.number()),
 })
   .index("by_user", ["userId"])
+  .index("by_user_persian", ["userId", "persian"])

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 8

__

Why: This suggestion correctly identifies a significant performance optimization opportunity by adding a composite index, which will make several queries in convex/wordProgress.ts much more efficient.

Medium
  • More

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
convex/wordProgress.ts (1)

6-17: 🛠️ Refactor suggestion | 🟠 Major

Move modified Convex queries/mutations to songs.ts (or update the guideline).

The changed query/mutation implementations are in convex/wordProgress.ts, but the repository guideline requires queries/mutations in songs.ts.

As per coding guidelines, “Use Convex schema.ts for database schema definitions and songs.ts for queries and mutations.”

Also applies to: 20-32, 35-47, 51-71, 74-92, 96-129, 133-166, 169-216, 219-260, 263-357, 362-467, 469-532, 535-574, 577-605

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@convex/wordProgress.ts` around lines 6 - 17, The Convex query implementations
(e.g., getByUser in convex/wordProgress.ts and the other queries/mutations
listed) must be moved into songs.ts to follow the project guideline; open
songs.ts and copy each query/mutation implementation (keeping function names
like getByUser) into that file, export them from songs.ts, update any imports
that referenced convex/wordProgress.ts to point to songs.ts, and either delete
convex/wordProgress.ts or leave a thin forwarding stub that re-exports the same
names from songs.ts to avoid breaking references; ensure Convex query/mutation
registrations remain intact and run tests/TypeScript build to confirm no
unresolved imports.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@convex/wordProgress.ts`:
- Around line 61-65: The queries against the "wordProgress" table use
.withIndex("by_user").filter(...) which forces a post-index scan; add composite
indexes named by_user_word (userId + word) and by_user_persian (userId +
persian) to the Convex schema and replace the .withIndex("by_user") calls with
.withIndex("by_user_word") or .withIndex("by_user_persian") at the affected call
sites (e.g., the query in wordProgress.ts that calls
ctx.db.query("wordProgress").withIndex("by_user", (q) => q.eq("userId",
userId)).filter((q) => q.eq(q.field("persian"), persian)).first()) so each query
can match the composite index instead of filtering after the index; update all
listed locations (the other withIndex("by_user").filter usages) to pick the
correct composite index for their filter key.

---

Outside diff comments:
In `@convex/wordProgress.ts`:
- Around line 6-17: The Convex query implementations (e.g., getByUser in
convex/wordProgress.ts and the other queries/mutations listed) must be moved
into songs.ts to follow the project guideline; open songs.ts and copy each
query/mutation implementation (keeping function names like getByUser) into that
file, export them from songs.ts, update any imports that referenced
convex/wordProgress.ts to point to songs.ts, and either delete
convex/wordProgress.ts or leave a thin forwarding stub that re-exports the same
names from songs.ts to avoid breaking references; ensure Convex query/mutation
registrations remain intact and run tests/TypeScript build to confirm no
unresolved imports.

ℹ️ Review info

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 93c1185 and d65d19b.

📒 Files selected for processing (2)
  • convex/schema.ts
  • convex/wordProgress.ts
📜 Review details
🧰 Additional context used
📓 Path-based instructions (5)
**/*.{ts,tsx}

📄 CodeRabbit inference engine (AGENTS.md)

Use TanStack Start framework with Bun runtime for the application

Tests must pass locally via bun run test before committing code, as Husky pre-commit hooks will block commits with failing tests

Files:

  • convex/schema.ts
  • convex/wordProgress.ts
convex/schema.ts

📄 CodeRabbit inference engine (AGENTS.md)

Define database schema in convex/schema.ts using Convex

Files:

  • convex/schema.ts
convex/*.ts

📄 CodeRabbit inference engine (AGENTS.md)

Use Convex for database queries, mutations, and authentication configuration

Files:

  • convex/schema.ts
  • convex/wordProgress.ts
**/*.ts

📄 CodeRabbit inference engine (CLAUDE.md)

All new helpers and utilities MUST have corresponding test files

Files:

  • convex/schema.ts
  • convex/wordProgress.ts
convex/**/*.ts

📄 CodeRabbit inference engine (CLAUDE.md)

convex/**/*.ts: NEVER create .js files in the convex/ folder - only .ts files belong there
Before starting Convex dev server, always run rm -f convex/*.js to clean compiled JavaScript files
Use Convex schema.ts for database schema definitions and songs.ts for queries and mutations

Files:

  • convex/schema.ts
  • convex/wordProgress.ts
🧠 Learnings (4)
📚 Learning: 2026-01-28T12:17:53.877Z
Learnt from: CR
Repo: EtanHey/songscript PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-28T12:17:53.877Z
Learning: Applies to convex/**/*.ts : Use Convex schema.ts for database schema definitions and songs.ts for queries and mutations

Applied to files:

  • convex/schema.ts
  • convex/wordProgress.ts
📚 Learning: 2026-01-23T18:12:38.519Z
Learnt from: CR
Repo: EtanHey/songscript PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-23T18:12:38.519Z
Learning: Applies to convex/schema.ts : Define database schema in convex/schema.ts using Convex

Applied to files:

  • convex/schema.ts
📚 Learning: 2026-01-23T18:12:49.193Z
Learnt from: CR
Repo: EtanHey/songscript PR: 0
File: prd-json/AGENTS.md:0-0
Timestamp: 2026-01-23T18:12:49.193Z
Learning: Applies to prd-json/**/convex/**/*{score,progress,leaderboard}*.{ts,tsx,js} : Use Progress Score Formula: (words_learned × multiplier) + (lines_completed × multiplier × 0.5)

Applied to files:

  • convex/wordProgress.ts
📚 Learning: 2026-01-23T18:12:38.519Z
Learnt from: CR
Repo: EtanHey/songscript PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-23T18:12:38.519Z
Learning: Applies to convex/*.ts : Use Convex for database queries, mutations, and authentication configuration

Applied to files:

  • convex/wordProgress.ts
🔇 Additional comments (2)
convex/schema.ts (1)

51-53: Nice optimization: by_persian index is the right fix for this lookup path.

This directly removes the prior scan-heavy pattern for Persian word lookups in modal flows.

convex/wordProgress.ts (1)

473-476: Great improvement on word-details lookup path.

Switching to words.by_persian directly addresses the full-scan bottleneck in this modal query.

Comment on lines 61 to 65
const progress = await ctx.db
.query("wordProgress")
.withIndex("by_user", (q) =>
q.eq("userId", userId)
)
.withIndex("by_user", (q) => q.eq("userId", userId))
.filter((q) => q.eq(q.field("persian"), persian))
.first();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Check current indexes in convex/schema.ts for wordProgress
rg -n 'wordProgress.*defineTable|\.index\(' convex/schema.ts -A 40 | head -100

Repository: EtanHey/songscript

Length of output: 4583


🏁 Script executed:

# Check the specific patterns at the mentioned lines in convex/wordProgress.ts
rg -n '\.withIndex\("by_user"' convex/wordProgress.ts -C 3

Repository: EtanHey/songscript

Length of output: 3042


🏁 Script executed:

# Check if composite indexes are already defined
rg -n '\.index\("by_user' convex/schema.ts -A 2

Repository: EtanHey/songscript

Length of output: 1033


🏁 Script executed:

# Verify the coding guideline - check if there are queries/mutations in wordProgress.ts vs songs.ts
echo "=== Exports in convex/wordProgress.ts ===" && \
rg -n '^export (const|function|default)' convex/wordProgress.ts | head -20 && \
echo "" && \
echo "=== Exports in convex/songs.ts ===" && \
rg -n '^export (const|function|default)' convex/songs.ts | head -20

Repository: EtanHey/songscript

Length of output: 1092


Add composite indexes by_user_word and by_user_persian to eliminate post-index filtering scans.

The by_user index is single-column only. All flagged call sites use .withIndex("by_user").filter(...) patterns, which scan all user rows before applying the filter—causing O(user_rows) performance degradation in hot paths. The codebase already defines composite indexes for visitorId-based lookups (by_visitor_word, by_visitor_persian); apply the same pattern for userId.

Required schema changes
# convex/schema.ts (wordProgress indexes)
 .index("by_user", ["userId"])
+.index("by_user_word", ["userId", "wordId"])
+.index("by_user_persian", ["userId", "persian"]),
# convex/wordProgress.ts examples
- .withIndex("by_user", (q) => q.eq("userId", userId))
- .filter((q) => q.eq(q.field("wordId"), wordId))
+ .withIndex("by_user_word", (q) =>
+   q.eq("userId", userId).eq("wordId", wordId)
+ )

- .withIndex("by_user", (q) => q.eq("userId", userId))
- .filter((q) => q.eq(q.field("persian"), persian))
+ .withIndex("by_user_persian", (q) =>
+   q.eq("userId", userId).eq("persian", persian)
+ )

Affected locations: lines 28–30, 43–45, 63–65, 84–86, 104–106, 141–143, 177–179, 227–229, 552–554.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@convex/wordProgress.ts` around lines 61 - 65, The queries against the
"wordProgress" table use .withIndex("by_user").filter(...) which forces a
post-index scan; add composite indexes named by_user_word (userId + word) and
by_user_persian (userId + persian) to the Convex schema and replace the
.withIndex("by_user") calls with .withIndex("by_user_word") or
.withIndex("by_user_persian") at the affected call sites (e.g., the query in
wordProgress.ts that calls ctx.db.query("wordProgress").withIndex("by_user", (q)
=> q.eq("userId", userId)).filter((q) => q.eq(q.field("persian"),
persian)).first()) so each query can match the composite index instead of
filtering after the index; update all listed locations (the other
withIndex("by_user").filter usages) to pick the correct composite index for
their filter key.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant